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The following listing of claims will replace all prior versions, and listings, of claims in the 
application. 


Listing of Claims: 

The claims 1-16 currently pending in the application are as follows. 


1 1 « (currently amended) A video conferencing system comprising: 

2 a stalionaty ftft image pickup device, remaining motionless during operation, for 

3 generating image signals representative of an imago; 

4 an audio pickup device for generating audio signals representative of sound from an 

5 audio sourc e, wherein said audio pickup device is configured to locate an audio source when said 

6 audio source is sta ti onary and to track said audio source when said audio source is uonstationarv : 

7 and 

8 a multimodal integration architecture system for processing said image signals and said 

9 audio signals to determine a direction of the audio source relative to a reference poin t, wherein 

10 said multimodal int egratio n architecture system is adapted to track the direction of said aiidio 

1 1 source whe n said audio source produces sound and when said audio source does not produce 

12 sound , 


1 2. (original) The video conferencing system of claim 1 wherein said multimodal integration 

2 architecture system further comprises: 

3 an audio source localization system; 

4 a computer vision person detection system; and 

5 a multimodal speaker detection system. 
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1 3. (original) The video conferencing system of claim 2, further comprising an integrated 

2 housing for an integrated video conferencing system incorporating the image pickup device, the 

3 audio pickup device, and the multimodal integration architecture system. 

1 4. (original) The video conferencing system of claim 3, wherein the integrated housing is 

2 sized for being portable, 

1 5. (original) The video conferencing system of claim 2, further comprising an electronic pan 

2 lilt zoom system for electronically manipulating the image signals to effectively provide at least 

3 one of variable pan, tilt, and zoom functions. 

1 6. (original) The video conferencing system of claim 5, wherein the image pickup device is a 

2 stationary camera, 

1 7, (original) The video conferencing system of claim 5, wherein the multimodal integrated 

2 architecture system provides control signals to the electronic pan lilt zoom system. 

1 8. (original) The video con ferencing system of claim 7, wherein the audio source moves 

2 relative to the reference point, the audio source locali/alion system detects the movement of the 

3 audio source* and, in response to the movement, the audio source localization system causes a 

4 change in the field of view of the image pickup device. 

1 9. (original) The video conferencing system of claim 5, wherein the audio pickup device is 

2 comprised of an array of two microphones. 

1 10. (currently amended) A method comprising the steps of: 
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2 generating, al a stationary an image pickup device, remaining motionless during 

3 operation, image signals representative of an image; 

4 generating, at an audio pickup device, audio signals representative of sound from an 

5 audio sourc e^ wherein s ai d fludio pickup devico is configured to locate an audio source when said 

6 audio source is st at ionary and to track said audio source when said audio source is nonstationarv : 

7 processing the image signals and the audio signals to determine a direction of the audio 

8 source relative to a reference poin t, wherein the processing is adapted to track the dfrection of 
y said audio source when s aid audio source produces sound and when said audio source docs not 

10 BEPd wCQ s omid ; 

1 1 manipulating the image signals lo produce refined image signals; and 

1 2 outputting said refined imago signals. 

1 11. (currently amended) The method of claim 1 0 further comprising the steps of: 

2 applying said audio signals lo an audio source localization system; 

3 applying said image signals to a computer vision person detection system; 

4 processing said audio signals and said image signals with a multimodal speaker detection 

5 system; 

6 generating control signals based on the determined direction of the audio source; 

7 applying the control signals to an electronic pan tilt zoom system to mimic the effect oTat 

8 least one function of a movable camera* said function selected from the group consisting of 

9 panning, tilling, and zooming said movable camera; and 

1 0 providing an output from said electronic pan tilt zoom system, 

1 1 2. (original) The method of claim 1 0, further comprising electronically varying a field of 

2 view of the image pickup device in response to the control signals, 
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1 1 3. (original) The method of claim 1 0, wherein processing the audio signals includes 

2 determining an audio based direction of the audio source based on the audio signals. 

1 1 4. (original) The method of claim 12, wherein the audio source moves relative to a 

2 reference point, and wherein processing the audio signals further includes: 

3 detecting the movement of the audio source; and 

4 causing electronically, in response to the movement, an increase in the field of view of 

5 the image pickup device. 

1 15, (original) The method of claim 12, further comprising the step of supplying control 

2 signals, based on the audio based direction, for electronically panning, tilting, or zooming said 

3 imago pickup device. 

1 16. (currently amended) A video conferencing system comprising: 

2 two microphones for generating audio signals representative of sound from a speaker* 

3 wherein said micro phon es arc configured to locate a speaker when said speaker is stationary and 

4 to irack said speaker when said speaker is nonstationary: 

5 a stationary video camera, remaining motionless during operation, for generating video 

6 signals representative of a video image; 

7 an electronic pan tilt zoom system for manipulating video images to produce the visual 

8 effects of panning, tilting, and/or zooming; 

9 a processor for processing the video signals and the audio signals to determine a direction 
• 0 °f JhS ft speaker relative to a reference point and supplying control signals to the electronic pan 

1 1 till zoom system for producing images that include the speaker in the field of view of the camera, 

12 the control signals being gonoratcd based on the determined direction of the speaker^ wherein 

1 3 said proppsspr fo adapted to track the direction of said speaker when said speaker produces sound 
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14 and whe n said speaker docs not produce sound : and 

1 5 a transmitter for transmitting audio and video signals for video conferencing, 
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